192 research outputs found
webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser
<p>Abstract</p> <p>Background</p> <p>Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites.</p> <p>Results</p> <p>We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to <it>de novo </it>alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at <url>http://tinyurl.com/webprank</url> .</p> <p>Conclusions</p> <p>The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence alignment and allows the performance of all alignment-related activity for small sequence analysis projects using only a standard web browser.</p
Effects of marker type and filtering criteria on QST-FST comparisons
Comparative studies of quantitative and neutral genetic differentiation (QST-FST tests) provide means to detect adaptive population differentiation. However, QST-FST tests can be overly liberal if the markers used deflate FST below its expectation, or overly conservative if methodological biases lead to inflated FST estimates. We investigated how marker type and filtering criteria for marker selection influence QST-FST comparisons through their effects on FST using simulations and empirical data on over 18 000 in silico genotyped microsatellites and 3.8 million single-locus polymorphism (SNP) loci from four populations of nine-spined sticklebacks (Pungitius pungitius). Empirical and simulated data revealed that FST decreased with increasing marker variability, and was generally higher with SNPs than with microsatellites. The estimated baseline FST levels were also sensitive to filtering criteria for SNPs: both minor alleles and linkage disequilibrium (LD) pruning influenced FST estimation, as did marker ascertainment. However, in the case of stickleback data used here where QST is high, the choice of marker type, their genomic location, ascertainment and filtering made little difference to outcomes of QST-FST tests. Nevertheless, we recommend that QST-FST tests using microsatellites should discard the most variable loci, and those using SNPs should pay attention to marker ascertainment and properly account for LD before filtering SNPs. This may be especially important when level of quantitative trait differentiation is low and levels of neutral differentiation high. © 2019 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.Peer reviewe
PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment
<p>Abstract</p> <p>Background</p> <p>The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity.</p> <p>Results</p> <p><monospace>PhyloSim</monospace> is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, <monospace>PhyloSim</monospace> can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing <monospace>PhyloSim</monospace> to be adapted to specific needs.</p> <p>Conclusions</p> <p>Close integration with <monospace>R</monospace> and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that <monospace>PhyloSim</monospace> will be useful to future studies involving simulated alignments.</p
Evolutionary Sequence Analysis and Visualization with Wasabi
Wasabi is an open-source, web-based graphical environment for evolutionary sequence analysis and visualization, designed to work with multiple sequence alignments within their phylogenetic context. Its interactive user interface provides convenient access to external data sources and computational tools and is easily extendable with custom tools and pipelines using a plugin system. Wasabi stores intermediate editing and analysis steps as workflow histories and provides direct-access web links to datasets, allowing for reproducible, collaborative research, and easy dissemination of the results. In addition to shared analyses and installation-free usage, the web-based design allows Wasabi to be run as a cross-platform, stand-alone application and makes its integration to other web services straightforward. This chapter gives a detailed description and guidelines for the use of Wasabi's analysis environment. Example use cases will give step-by-step instructions for practical application of the public Wasabi, from quick data visualization to branched analysis pipelines and publishing of results. We end with a brief discussion of advanced usage of Wasabi, including command-line communication, interface extension, offline usage, and integration to local and public web services.Peer reviewe
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics
The Multiple Sequence Alignment (MSA) is a computational abstraction that
represents a partial summary either of indel history, or of structural
similarity. Taking the former view (indel history), it is possible to use
formal automata theory to generalize the phylogenetic likelihood framework for
finite substitution models (Dayhoff's probability matrices and Felsenstein's
pruning algorithm) to arbitrary-length sequences. In this paper, we report
results of a simulation-based benchmark of several methods for reconstruction
of indel history. The methods tested include a relatively new algorithm for
statistical marginalization of MSAs that sums over a stochastically-sampled
ensemble of the most probable evolutionary histories. For mammalian
evolutionary parameters on several different trees, the single most likely
history sampled by our algorithm appears less biased than histories
reconstructed by other MSA methods. The algorithm can also be used for
alignment-free inference, where the MSA is explicitly summed out of the
analysis. As an illustration of our method, we discuss reconstruction of the
evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with
arXiv:1103.434
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique
in bioinformatics used to infer related residues among biological sequences.
Thus alignment accuracy is crucial to a vast range of analyses, often in ways
difficult to assess in those analyses. To compare the performance of different
aligners and help detect systematic errors in alignments, a number of
benchmarking strategies have been pursued. Here we present an overview of the
main strategies--based on simulation, consistency, protein structure, and
phylogeny--and discuss their different advantages and associated risks. We
outline a set of desirable characteristics for effective benchmarking, and
evaluate each strategy in light of them. We conclude that there is currently no
universally applicable means of benchmarking MSA, and that developers and users
of alignment tools should base their choice of benchmark depending on the
context of application--with a keen awareness of the assumptions underlying
each benchmarking strategy.Comment: Revie
Vitellogenin Underwent Subfunctionalization to Acquire Caste and Behavioral Specific Expression in the Harvester Ant Pogonomyrmex barbatus
PMCID: PMC3744404This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication
Evolutionary distances in the twilight zone -- a rational kernel approach
Phylogenetic tree reconstruction is traditionally based on multiple sequence
alignments (MSAs) and heavily depends on the validity of this information
bottleneck. With increasing sequence divergence, the quality of MSAs decays
quickly. Alignment-free methods, on the other hand, are based on abstract
string comparisons and avoid potential alignment problems. However, in general
they are not biologically motivated and ignore our knowledge about the
evolution of sequences. Thus, it is still a major open question how to define
an evolutionary distance metric between divergent sequences that makes use of
indel information and known substitution models without the need for a multiple
alignment. Here we propose a new evolutionary distance metric to close this
gap. It uses finite-state transducers to create a biologically motivated
similarity score which models substitutions and indels, and does not depend on
a multiple sequence alignment. The sequence similarity score is defined in
analogy to pairwise alignments and additionally has the positive semi-definite
property. We describe its derivation and show in simulation studies and
real-world examples that it is more accurate in reconstructing phylogenies than
competing methods. The result is a new and accurate way of determining
evolutionary distances in and beyond the twilight zone of sequence alignments
that is suitable for large datasets.Comment: to appear in PLoS ON
How reliably can we predict the reliability of protein structure predictions?
Background:
Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences.
Results:
Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between C α amino acids in superimposed tertiary structures.
Conclusion:
Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level. </p
- …